An Efficient Statistical Speech Act Type Tagging System for Speech Translation Systems
نویسندگان
چکیده
This paper describes a new efficient speech act type tagging system. This system covers the tasks of (1) segmenting a turn into the optimal number of speech act units (SA units), and (2) assigning a speech act type tag (SA tag) to each SA unit. Our method is based on a theoretically clear statistical model that integrates linguistic, acoustic and situational information. We report tagging experiments on Japanese and English dialogue corpora manually labeled with SA tags. We then discuss the performance difference between the two languages. We also report on some translation experiments on positive response expressions using SA tags. 1 I n t r o d u c t i o n This paper describes a statistical speech act type tagging system that utilizes linguistic, acoustic and situational features. This work can be viewed as a study on automatic "Discourse Tagging" whose objective is to assign tags to discourse units in texts or dialogues. Discourse tagging is studied mainly from two different viewpoints, i.e., linguistic and engineering viewpoints. The work described here belongs to the latter group. More specifically, we are interested in automatically recognizing the speech act types of utterances and in applying them to speech translation systems. Several studies on discourse tagging to date have been motivated by engineering applications. The early studies by Nagata and Morimoto (1994) and Reithinger and Maier (1995) showed the possibility of predicting dialogue act tags for next utterances with statistical methods. These studies, however, presupposed properly segmented utterances, which is not a realistic assumption. In contrast to this assumption, automatic utterance segmentation (or discourse segmentation) is desired here. Discourse segmentation in linguistics, whether manual or automatic, has also received keen attention because such segmentation provides the foundation of higher discourse structures (Grosz and Sidnet, 1986). Discourse segmentation has also received keen attention from the engineering side because the natural language processing systems that follow the speech recognition system are designed to accept linguistically meaningful units (Stolcke and Shriberg, 1996). There has been a lot of research following this line such as (Stolcke and Shriberg, 1996) (Cettolo and Falavigna, 1998), to only mention a few. We can take advantage of these studies as a preprocess for tagging. In this paper, however, we propose a statistical tagging system that optimally performs segmentation and tagging at the same time. Previous studies like (Litman and Passonneau, 1995) have pointed out that the use of a multiple information source can contribute to better segmentation and tagging, and so our statistical model integrates linguistic, acoustic and situational information. The problem can be formalized as a search problem on a word graph, which can be efficiently handled by an extended dynamic programming algorithm. Actually, we can efficiently find the optimal solution without limiting the search space at all. The results of our tagging experiments involving both Japanese and English corpora indicated a high performance for Japanese but a considerably lower performance for the English corpora. This work also reports on the use of speech act type tags for translating Japanese and English positive response expressions. Positive responses quite often appear in task-oriented dialogues like those in our tasks. They are often highly ambiguous and problematic in speech translation. We will show that these expressions can be effectively translated with the help of dialogue information, which we call speech act type tags. 2 T h e P r o b l e m s In this section, we briefly explain our speech act type tags and the tagged data and then formally define the tagging problem.
منابع مشابه
سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی
Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...
متن کاملOn the Relationship between Emotional Intelligence and Directive Speech Acts Preference
Language and emotion are two related systems in use, in that one system (emotions) impacts the performance of the other (language). Both of them share their functionality in communication. Since the nature of foreign language classrooms is ideally interactional, emotional intelligence (EI) gains importance. The aim of this study was to find out whether one's total emotional quotient and its com...
متن کاملReal-Time Statistical Speech Translation
This research investigates the Statistical Machine Translation approaches to translate speech in real time automatically. Such systems can be used in a pipeline with speech recognition and synthesis software in order to produce a real-time voice communication system between foreigners. We obtained three main data sets from spoken proceedings that represent three different types of human speech....
متن کاملImpact of Collaborative Output-Based Instruction on EFL Learners’ Awareness of the Speech Act of Apology
A sizeable body of research into instructed pragmatics roots from the noticing hypothesis: comparing im- plicit and explicit instruction. It is only recently that other theories, including the output hypothesis, have been researched as possible explanations of interlanguage pragmatic development. Pursuing the same line of research, the present study addressed the impact of collaborative o...
متن کاملExploring the Use of Target-Language Information to Train the Part-of-Speech Tagger of Machine Translation Systems
When automatically translating between related languages, one of the main sources of machine translation errors is the incorrect resolution of part-of-speech (PoS) ambiguities. Hidden Markov models (HMM) are the standard statistical approach to try to properly resolve such ambiguities. The usual training algorithms collect statistics from source-language texts in order to adjust the parameters ...
متن کامل